{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NpJd3dlOCStH"
      },
      "source": [
        "\u003ca href=\"https://colab.research.google.com/github/magenta/ddsp/blob/master/ddsp/colab/tutorials/0_processor.ipynb\" target=\"_parent\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/\u003e\u003c/a\u003e"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hMqWDc_m6rUC"
      },
      "source": [
        "\n",
        "##### Copyright 2021 Google LLC.\n",
        "\n",
        "Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "VNhgka4UKNjf"
      },
      "outputs": [],
      "source": [
        "# Copyright 2021 Google LLC. All Rights Reserved.\n",
        "#\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#     http://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License.\n",
        "# =============================================================================="
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZFIqwYGbZ-df"
      },
      "source": [
        "# DDSP Processor Demo\n",
        "\n",
        "This notebook provides an introduction to the signal `Processor()` object. The main object type in the DDSP library, it is the base class used for Synthesizers and Effects, which share the methods:\n",
        "\n",
        "* `get_controls()`: inputs -\u003e controls.\n",
        "* `get_signal()`: controls -\u003e signal.\n",
        "* `__call__()`: inputs -\u003e signal. (i.e. `get_signal(**get_controls())`)\n",
        "\n",
        "Where:\n",
        "* `inputs` is a variable number of tensor arguments (depending on processor). Often the outputs of a neural network.\n",
        "* `controls` is a dictionary of tensors scaled and constrained specifically for the processor\n",
        "* `signal` is an output tensor (usually audio or control signal for another processor)\n",
        "\n",
        "Let's see why this is a helpful approach by looking at the specific example of the `Harmonic()` synthesizer processor. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "21w0_tyszEtN"
      },
      "outputs": [],
      "source": [
        "#@title Install and import dependencies\n",
        "\n",
        "%tensorflow_version 2.x\n",
        "!pip install -qU ddsp\n",
        "\n",
        "# Ignore a bunch of deprecation warnings\n",
        "import warnings\n",
        "warnings.filterwarnings(\"ignore\")\n",
        "\n",
        "import ddsp\n",
        "import ddsp.training\n",
        "from ddsp.colab.colab_utils import play, specplot, DEFAULT_SAMPLE_RATE\n",
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "import tensorflow as tf\n",
        "\n",
        "sample_rate = DEFAULT_SAMPLE_RATE  # 16000"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AiCIG1x5bxkh"
      },
      "source": [
        "# Example: harmonic synthesizer\n",
        "\n",
        "The harmonic synthesizer models a sound as a linear combination of harmonic sinusoids. Amplitude envelopes are generated with 50% overlapping hann windows. The final audio is cropped to n_samples."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "59gA5RKMHiS3"
      },
      "source": [
        "## `__init__()`\n",
        "\n",
        "All member variables are initialized in the constructor, which makes it easy to change them as hyperparameters using the [gin](https://github.com/google/gin-config) dependency injection library. All processors also have a `name` that is used by `ProcessorGroup()`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "mtNivsWq3qtW"
      },
      "outputs": [],
      "source": [
        "n_frames = 1000\n",
        "hop_size = 64\n",
        "n_samples = n_frames * hop_size\n",
        "\n",
        "# Create a synthesizer object.\n",
        "harmonic_synth = ddsp.synths.Harmonic(n_samples=n_samples,\n",
        "                                      sample_rate=sample_rate)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vuDIPhc58ZQI"
      },
      "source": [
        "\n",
        "## `get_controls()` \n",
        "\n",
        "The outputs of a neural network are often not properly scaled and constrained. The `get_controls` method gives a dictionary of valid control parameters based on neural network outputs.\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xFPeXRPG6I44"
      },
      "source": [
        "**3 inputs (amps, hd, f0)**\n",
        "* `amplitude`: Amplitude envelope of the synthesizer output.\n",
        "* `harmonic_distribution`: Normalized amplitudes of each harmonic.\n",
        "* `fundamental_frequency`: Frequency in Hz of base oscillator\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "4v4q5NuM4JWf"
      },
      "outputs": [],
      "source": [
        "# Generate some arbitrary inputs.\n",
        "\n",
        "# Amplitude [batch, n_frames, 1].\n",
        "# Make amplitude linearly decay over time.\n",
        "amps = np.linspace(1.0, -3.0, n_frames)\n",
        "amps = amps[np.newaxis, :, np.newaxis]\n",
        "\n",
        "# Harmonic Distribution [batch, n_frames, n_harmonics].\n",
        "# Make harmonics decrease linearly with frequency.\n",
        "n_harmonics = 30\n",
        "harmonic_distribution = (np.linspace(-2.0, 2.0, n_frames)[:, np.newaxis] + \n",
        "                         np.linspace(3.0, -3.0, n_harmonics)[np.newaxis, :])\n",
        "harmonic_distribution = harmonic_distribution[np.newaxis, :, :]\n",
        "\n",
        "# Fundamental frequency in Hz [batch, n_frames, 1].\n",
        "f0_hz = 440.0 * np.ones([1, n_frames, 1], dtype=np.float32)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "KVTMi2dX5yFe"
      },
      "outputs": [],
      "source": [
        "# Plot it!\n",
        "time = np.linspace(0, n_samples / sample_rate, n_frames)\n",
        "\n",
        "plt.figure(figsize=(18, 4))\n",
        "plt.subplot(131)\n",
        "plt.plot(time, amps[0, :, 0])\n",
        "plt.xticks([0, 1, 2, 3, 4])\n",
        "plt.title('Amplitude')\n",
        "\n",
        "plt.subplot(132)\n",
        "plt.plot(time, harmonic_distribution[0, :, :])\n",
        "plt.xticks([0, 1, 2, 3, 4])\n",
        "plt.title('Harmonic Distribution')\n",
        "\n",
        "plt.subplot(133)\n",
        "plt.plot(time, f0_hz[0, :, 0])\n",
        "plt.xticks([0, 1, 2, 3, 4])\n",
        "_ = plt.title('Fundamental Frequency')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ORU6bwfWRDks"
      },
      "source": [
        "Consider the plots above as outputs of a neural network. These outputs violate the synthesizer's expectations:\n",
        "* Amplitude is not \u003e= 0 (avoids phase shifts)\n",
        "* Harmonic distribution is not normalized (factorizes timbre and amplitude)\n",
        "* Fundamental frequency * n_harmonics \u003e nyquist frequency (440 * 20 \u003e 8000), which will lead to [aliasing](https://en.wikipedia.org/wiki/Aliasing).\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "zrYgCcby_xZg"
      },
      "outputs": [],
      "source": [
        "controls = harmonic_synth.get_controls(amps, harmonic_distribution, f0_hz)\n",
        "print(controls.keys())"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "lnzqmowPB5Lu"
      },
      "outputs": [],
      "source": [
        "# Now let's see what they look like...\n",
        "time = np.linspace(0, n_samples / sample_rate, n_frames)\n",
        "\n",
        "plt.figure(figsize=(18, 4))\n",
        "plt.subplot(131)\n",
        "plt.plot(time, controls['amplitudes'][0, :, 0])\n",
        "plt.xticks([0, 1, 2, 3, 4])\n",
        "plt.title('Amplitude')\n",
        "\n",
        "plt.subplot(132)\n",
        "plt.plot(time, controls['harmonic_distribution'][0, :, :])\n",
        "plt.xticks([0, 1, 2, 3, 4])\n",
        "plt.title('Harmonic Distribution')\n",
        "\n",
        "plt.subplot(133)\n",
        "plt.plot(time, controls['f0_hz'][0, :, 0])\n",
        "plt.xticks([0, 1, 2, 3, 4])\n",
        "_ = plt.title('Fundamental Frequency')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wVgLs8_BRuxz"
      },
      "source": [
        "Notice that \n",
        "* Amplitudes are now all positive\n",
        "* The harmonic distribution sums to 1.0\n",
        "* All harmonics that are above the Nyquist frequency now have an amplitude of 0."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nbY9iIbdDljR"
      },
      "source": [
        "The amplitudes and harmonic distribution are scaled by an \"exponentiated sigmoid\" function (`ddsp.core.exp_sigmoid`). There is nothing particularly special about this function (other functions can be specified as `scale_fn=` during construction), but it has several nice properties:\n",
        "* Output scales logarithmically with input (as does human perception of loudness).\n",
        "* Centered at 0, with max and min in reasonable range for normalized neural network outputs.\n",
        "* Max value of 2.0 to prevent signal getting too loud.\n",
        "* Threshold value of 1e-7 for numerical stability during training."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "r6-dnQ2NEM1P"
      },
      "outputs": [],
      "source": [
        "x = tf.linspace(-10.0, 10.0, 1000)\n",
        "y = ddsp.core.exp_sigmoid(x)\n",
        "\n",
        "plt.figure(figsize=(18, 4))\n",
        "plt.subplot(121)\n",
        "plt.plot(x, y)\n",
        "\n",
        "plt.subplot(122)\n",
        "_ = plt.semilogy(x, y)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "z32JcedbIM9J"
      },
      "source": [
        "## `get_signal()`\n",
        "\n",
        "Synthesizes audio from controls."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "k-lraCkcIjyK"
      },
      "outputs": [],
      "source": [
        "audio = harmonic_synth.get_signal(**controls)\n",
        "\n",
        "play(audio)\n",
        "specplot(audio)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LSvRsjJUVxOA"
      },
      "source": [
        "## `__call__()` \n",
        "\n",
        "Synthesizes audio directly from the raw inputs. `get_controls()` is called internally to turn them into valid control parameters."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "DBIeyZLLI-RO"
      },
      "outputs": [],
      "source": [
        "audio = harmonic_synth(amps, harmonic_distribution, f0_hz)\n",
        "\n",
        "play(audio)\n",
        "specplot(audio)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7j5XCUJkK9WZ"
      },
      "source": [
        "# Example: Just for fun... \n",
        "Let's run another example where we tweak some of the controls..."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "2uN7x3wqLBcD"
      },
      "outputs": [],
      "source": [
        "## Some weird control envelopes...\n",
        "\n",
        "# Amplitude [batch, n_frames, 1].\n",
        "amps = np.ones([n_frames]) * -5.0\n",
        "amps[:50] +=  np.linspace(0, 7.0, 50)\n",
        "amps[50:200] += 7.0\n",
        "amps[200:900] += (7.0 - np.linspace(0.0, 7.0, 700))\n",
        "amps *= np.abs(np.cos(np.linspace(0, 2*np.pi * 10.0, n_frames)))\n",
        "amps = amps[np.newaxis, :, np.newaxis]\n",
        "\n",
        "# Harmonic Distribution [batch, n_frames, n_harmonics].\n",
        "n_harmonics = 20\n",
        "harmonic_distribution = np.ones([n_frames, 1]) * np.linspace(1.0, -1.0, n_harmonics)[np.newaxis, :]\n",
        "for i in range(n_harmonics):\n",
        "  harmonic_distribution[:, i] = 1.0 - np.linspace(i * 0.09, 2.0, 1000)\n",
        "  harmonic_distribution[:, i] *= 5.0 * np.abs(np.cos(np.linspace(0, 2*np.pi * 0.1 * i, n_frames)))\n",
        "  if i % 2 != 0:\n",
        "    harmonic_distribution[:, i] = -3\n",
        "harmonic_distribution = harmonic_distribution[np.newaxis, :, :]\n",
        "\n",
        "# Fundamental frequency in Hz [batch, n_frames, 1].\n",
        "f0_hz = np.ones([n_frames]) * 200.0\n",
        "f0_hz[:100] *= np.linspace(2, 1, 100)**2\n",
        "f0_hz[200:1000] += 20 * np.sin(np.linspace(0, 8.0, 800) * 2 * np.pi * np.linspace(0, 1.0, 800))  * np.linspace(0, 1.0, 800)\n",
        "f0_hz = f0_hz[np.newaxis, :, np.newaxis]\n",
        "\n",
        "# Get valid controls\n",
        "controls = harmonic_synth.get_controls(amps, harmonic_distribution, f0_hz)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ovBa3pkUMrAC"
      },
      "outputs": [],
      "source": [
        "# Plot!\n",
        "time = np.linspace(0, n_samples / sample_rate, n_frames)\n",
        "\n",
        "plt.figure(figsize=(18, 4))\n",
        "plt.subplot(131)\n",
        "plt.plot(time, controls['amplitudes'][0, :, 0])\n",
        "plt.xticks([0, 1, 2, 3, 4])\n",
        "plt.title('Amplitude')\n",
        "\n",
        "plt.subplot(132)\n",
        "plt.plot(time, controls['harmonic_distribution'][0, :, :])\n",
        "plt.xticks([0, 1, 2, 3, 4])\n",
        "plt.title('Harmonic Distribution')\n",
        "\n",
        "plt.subplot(133)\n",
        "plt.plot(time, controls['f0_hz'][0, :, 0])\n",
        "plt.xticks([0, 1, 2, 3, 4])\n",
        "_ = plt.title('Fundamental Frequency')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Xf7Vc3UtNQ87"
      },
      "outputs": [],
      "source": [
        "audio = harmonic_synth.get_signal(**controls)\n",
        "\n",
        "play(audio)\n",
        "specplot(audio)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "FPyGXmMrR8j3"
      },
      "outputs": [],
      "source": [
        ""
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [
        "hMqWDc_m6rUC",
        "ZFIqwYGbZ-df",
        "AiCIG1x5bxkh",
        "59gA5RKMHiS3",
        "vuDIPhc58ZQI",
        "z32JcedbIM9J",
        "LSvRsjJUVxOA",
        "7j5XCUJkK9WZ"
      ],
      "last_runtime": {},
      "name": "0_processor.ipynb",
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
